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1. Introduction 

Consider observing a random sample X^ n > of size n, or at noise level n" 1 / 2 , 
drawn from distribution P? indexed by some unknown parameter / e J. The 
Bayesian paradigm views the sample as having law PJ conditionally on /, that 
is, X^\f ~ Pf, where / is drawn from some prior probability distribution II 
on some cr-ficld B of T . The random variable f\X^ then has a law on T which 
is known as the posterior distribution, denoted by II(-|XW). Bayesian inference 
on / is then entirely based on this posterior distribution - it gives access to 
point estimates for /, credible sets and tests in a natural way. 

It is of interest to analyse the behaviour of Yi{-\X^ n ') under the frequen- 
tist sampling assumption that X^ is drawn from PJ? for some fixed nonran- 
dom /o G J 7 , in particular it seems important to understand to which extent 
Bayesian procedures based on the posterior lead to valid frequentist inference. 
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Quite remarkably if J 7 is a finite- dimensional space then in most situations 
posterior-based inference is not only valid from a frequentist perspective, but in 
fact asymptotically optimal. Perhaps the most fundamental explanation for this 
phenomenon is given by the Bernstein - von Mises (BvM) theorem, first discov- 
ered by Laplace [26], developed by von Mises (see p.l56f. in [40]) and put into 
the framework of modern parametric statistics by Le Cam [27]. It states that, 
under mild and universal assumptions on the prior, the posterior distribution 
approximately equals a normal distribution on T . centered at an efficient esti- 
mator f n for /o, and with covariance i(fo) the Cramer- Rao information bound 
in the statistical model considered: 



as n — > oo in Pj? -probability. Practically this means that posterior- based infer- 
ence asymptotically coincides with inference based on standard efficient, 1/^/n- 
consistcnt frequentist estimators of fo, and that Bayesian methods can be rig- 
orously justified from an asymptotic frequentist point of view. 

The last decade has seen remarkable activity in the development of nonpara- 
metric Bayes procedures, where T is taken to be an infinite-dimensional space, 
typically consisting of functions or infinite vectors: nonparametric regression, 
classification, density estimation, normal means and Gaussian white noise mod- 
els come to mind, and a variety of nonparametric priors have been devised in the 
literature for such models. Posteriors in such models can be computed efficiently 
by algorithms such as MCMC, and they provide broadly applicable Bayesian 
inferential tools for nonparametric problems. It is natural to ask whether an 
analogue of (1) can still be proved in such situations, as it would give a general 
justification for the use of nonparametric Bayes procedures. Although remark- 
able progress has been made in the understanding of the frequentist properties 
of nonparametric Bayes procedures - we refer here only to some of the key pa- 
pers such as [17, 18, 34, 36, 38] and references therein - a satisfactory answer 
to the BvM-question seems not to have been found. A first reason is perhaps 
that it is not immediately clear what N(f n ,i(fo)) should be replaced by in the 
infinite-dimensional situation - Gaussian distributions over infinite dimensional 
spaces T are much more complex objects, and their existence (in the form rele- 
vant here) depends on the topology that T is endowed with. Likewise whether 
1/v^i-efficient estimators /„ for fo exist or not depends strongly on the notion of 
distance on J- that one employs, and many of the commonly used loss functions 
in nonparametric statistics (such as L p -type loss) are not admissible. 

A first step to understand this phenomenon better is thus to look for loss 
functions on T for which efficient frequentist estimators /„ with certain Gaus- 
sian limit distributions exist. This leads naturally to the setting of empirical 
processes: For example in the situation where one observes a random sam- 
ple Xi, . . . , X n from a law P on [0,1] we know that the empirical measure 
P„ = n~ x Xa=i i s a l/v^~ emc i en t estimator for P in the space of bounded 
functions l°°(T-L) on any P-Donsker class T-L of functions h : [0, 1] — > R. More 



sup IL(B\X^) - N(f n ,i(f ))(B) ->0 



(1) 
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concretely this means that for such 7-L 



Pn - P\\u = SUp 
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(2) 



is Op(n~ 1 ' 2 ), in fact ^/n(P n — P) converges in distribution in £°°{H) to a tight 
Gaussian random variable iV(0,i(P)) over £°°(T-L), and the covariance structure 
i(P) of the limiting variable achieves the Cramer-Rao information bound for 
the fully nonparametric model (see Section 3.11.1 in [39]). 

A natural setting for nonparametric Bernstein- von Mises theorems is thus to 
embed the parameter space T into an £°°(%)-type space. The purpose of the 
present paper is to investigate this approach rigorously in the situation of the 
Gaussian white noise model, and with T-L s a ball in a Sobolev space of order 
s > 1/2 - this makes the mathematical analysis tractable without any severe 
loss of conceptual generality. Our main results will imply that for a large and 
relevant class of product priors II that satisfy mild assumptions, and which do 
not require conjugacy, one has 



sup 

AeA 



U{A\X^)-NCUM)){A) 4 P '»0 (3) 



where N is a Gaussian measure on i°°(T-L s ) centered at an efficient estimator f n 
of fo, both to be defined in a precise manner, and where the classes A consist of 
measurable subsets of £°°('H S ) that have uniformly smooth boundaries for the 
measure A". The result is proved by showing that the (suitably shifted) posterior 
converges weakly (in PJ o -probability) to a canonical Gaussian measure Af in 
i°°{Jis), and by exploiting the uniformity classes for weak convergence towards 
A/". We should note that some restrictions on the class A are necessary as one can 
show that in the infinite-dimensional situation the Bernstein- von Mises theorem 
cannot hold uniformly in all Borel sets of £°°(H S ) (see below Definition 1 for 
more discussion). Our assumptions apply in particular to priors that produce 
posteriors which achieve frequentist optimal contraction rates in stronger loss 
functions (such as L 2 -distance) and which resemble the state of the art prior 
choices in the nonparametric Bayes literature. 

Our abstract results clearly only gain relevance through the fact that we can 
demonstrate their applicability: The general result (3) implies that posterior- 
based credible regions give asymptotically exact frequentist confidence sets in 
a variety of concrete problems of nonparametric inference, chosen to highlight 
the scope of our techniques: Our examples, which are given in Section 2, in- 
clude weighted i 2 -cllipsoid credible regions for the unknown parameter /o, lin- 
ear functionals defined on L 2 such as moments of /o, credible bands for the 
Fourier coefficients of fo, a general class of nonlinear functionals defined on L 2 
such as the squared L 2 -norm H/oHii an d simultaneous credible bands for the 
auto-convolution fo * fa- 

We note that a key point in these applications is related to the notion of 
the 'plug-in property' of a nonparametric estimator, coined by Bickel and Ritov 
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[3] . A standard nonparametric estimator that is rate-optimal in a standard loss 
function (such as L p -loss) is said to have the plug-in property if it simultane- 
ously achieves the l/y^-rate for a large class of linear functionals, that is, if (2) 
holds with P„ replaced by the nonparametric estimator. Standard frequentist 
estimators such as kernel, wavelet and nonparametric maximum likelihood esti- 
mators satisfy this property, in fact one can even prove a corresponding uniform 
central limit theorem in £°°(T-l) for such estimators, see Kiefer and Wolfowitz 
[25], Nickl [29], Ginc and Nickl [21, 22]. Our results imply that this is also true 
in the Bayesian situation: The posterior contracts at the optimal rate in L 2 -loss 
and at the same time satisfies a Bernstein- von Mises theorem in Like- 
wise, formal Bayes estimators such as the posterior mean are efficient frequentist 
estimators of fo in £°°(% s ) while achieving the optimal nonparametric rate for 
fo m L\ 

There is some recent work on the BvM phenomenon for nonparametric pro- 
cedures that needs mentioning. Lcahu [28] derives interesting results on the pos- 
sibility and impossibility of BvM-theorems for undersmoothing priors - his neg- 
ative results will be relevant below. His positive findings are, however, strongly 
tied to the Gaussian conjugate situation, and do not address efficiency questions. 
Rivoirard and Rousseau [31] consider BvM-type results for linear functionals of 
certain models of probability density functions. A number of BvM-type results 
have been obtained for the fixed finite-dimensional posterior with dimension 
increasing to infinity: Ghosal [15] and Bontcmps [6] consider regression with a 
finite number of regressors, Ghosal [16] and Clarke and Ghosal [9] consider ex- 
ponential families, and the case of discrete probability distributions is treated in 
Boucheron and Gassiat [7]. We believe that our approach allows for a unifying 
framework for these results: We show that a 'functional' BvM-result holds, giv- 
ing access to many linear functionals in a uniform way. We finally note related 
work on semiparametric BvM- results in Castillo [8] and Bickel and Kleijn [2]. 

The outline of this article is as follows: In the next two subsections we intro- 
duce a general notion of the nonparametric Bernstein - von Mises phenomenon. 
In Section 2 we show that when this phenomenon holds, posterior-based infer- 
ence is valid from a frequentist point of view in a variety of concrete examples 
from nonparametric statistics. In Section 3 we prove that for a large class of 
natural priors on L 2 , the BvM phenomenon indeed occurs. 



1.1. The Weak Nonparametric Bernstein - von Mises Phenomenon 

Let L 2 := L 2 ([0, 1]) be the space of square integrable functions on [0,1]. For 
/ € L 2 , and dW a standard white noise, consider observing a random trajectory 
in the model 

dX {n \t) = f(t)dt + 4=dW(f), * G [0, 1]. (4) 

Let LI be a prior Borel probability distribution on L? and let n(-|A^™^) be the 
posterior distribution on L 2 given the observed trajectory X^ n \ 
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Except in conjugate situations the proof of a Bernstein - von Mises type 
result rests typically on the fact that efficient estimation at the rate l/y/n is 
possible. In the nonparametric situation this rules out L p -type loss functions, 
but leads one to consider weaker norms of the type || • ||^ discussed in (2). For 
the particular choice of T-L s equal to an order-s Sobolev-ball we can understand 
this better by using simple but useful Hilbcrt space duality arguments in the 
nested scale of Sobolev spaces {iJ£} r6 R on [0, 1]: We define these in precise detail 
below, but note for the moment that = L 2 and that the norm-continuous 
imbeddings 

HI CHl,r> t, 

hold, so to weaken the norm beyond L 2 means that we should decrease r to 
be negative. For s > the space H^ 11 can be realised in an isometric way as 
a closed subspace of 1°°{H S ). Although we shall never use this isometry and 
stick to negative order Sobolev spaces from now on, it hcuristically explains the 
connection to the discussion surrounding (2) above. Some more thought shows 
that one should increase s so far that the Gaussian experiment in (4) is well 
defined as a tight random element of -ff^~ s ■ This happens precisely as soon as s 
exceeds 1/2, in fact we show below that the random trajectory dX^ defines a 
tight Gaussian Borel random variable X^™' on LL^ 3 with mean / and covariance 
n~ x I. Thus if we denote by W the centered Gaussian Borel random variable on 
i^~ s with covariance /, then (4) can be written as 

X (n) = / + 4=W, (5) 
y/n 

a natural Gaussian shift experiment in -ff^~ s . One can show moreover that X^™) 
is an efficient estimator for / for the loss function of -ffiT s . 

Now any prior and posterior probability distribution on L 2 defines a tight 
Borel probability measure on H^ 11 simply by the continuous injection L 2 C H^ 8 ■ 
On H^ 8 and for z € -ff^~ s , define the measurable transformation 

r z : f Vn(f - z). 

Let II„ = n(-|X(™)) be the posterior distribution on and let n„ o r~ ( ^ be 

its image under t x <„) . The shape of n„ o t~ ( ^ reveals how the posterior concen- 
trates on l/i/n-iJ^" s -neighborhoods of the efficient estimator X^. To compare 
probability distributions on H^ 1 * we may use any metric for weak convergence 
of probability measures, and we choose the bounded Lipschitz metric here for 
convenience (it is defined in Section 4.2 below). Let Af be the standard Gaussian 
probability measure on -ff^~ s with mean zero and covariance / - its existence is 
ensured in Section 1.2. 

Definition 1. Consider data generated from equation (4) under a fixed function 
fo, and denote by PJ* the distribution o/X^. Let s > 1/2 and let (3 be the 
bounded- Lipschitz metric for weak convergence of probability measures on H^ s . 
We say that a prior II satisfies the weak Bernstein - von Mises phenomenon in 
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H 2 s if, as n —> oo, 

^or^.AO^O. (6) 

We note that the fact that the result is phrased in a way in which Af is 
independent of n is important. [The reason is that the metric /? does not induce 
a uniformity structure for the topology of weak convergence (see the Remark 
on p.413 in [11]).] 

Thus when the weak Bernstein - von Miscs phenomenon holds the poste- 
rior necessarily has the approximate shape of an infinite-dimensional Gaussian 
distribution. Moreover, we require this Gaussian distribution to equal Af - the 
canonical choice in view of efficiency considerations. The covariance of Af is 
the Cramer-Rao bound for estimating /o in the Gaussian shift experiment (5) 
in if^~ s -loss, and we shall see how this carries over to sufficiently regular real- 
valued functionals ^(/o), see Section 2.4 below. 

One may ask by analogy to the finite-dimensional situation whether a strong 
Bernstein-von Mises phenomenon, where (3 is replaced by the total variation 
norm, can be proved. It follows from Theorem 2 in [28] that already in the 
Gaussian conjugate situation such a result is not possible unless one restricts to 
very specific priors (which in particular do not satisfy the plug-in property that 
will be relevant below). 

Now with weak instead of total variation convergence we cannot infer that 
n„ o Ty(„) and Af are approximately the same for every Borel set in i?^" s , but 
only for sets B that are continuity sets for the probability measure Af. As we 
shall see this includes, for instance, any fixed norm ball B(0, M) in i?^~ s , so that 
for n large 

n(B(X("', M/v^)|X (n) ) ~ Af(B(0, M)) 

with PJ q -probability close to one. For statistical applications of the Bernstein 
- von Mises phenomenon one typically needs some uniformity in B, and this is 
where total variation results would be particularly useful. Weak convergence in 
implies that n„ o TZr n) is close to Af uniformly in certain classes of subsets 
of -ffT s whose boundaries are sufficiently regular relative to the measure Af (see 
Subsection 4.2), and we show below how this allows for enough uniformity to 
deal with a variety of concrete nonparametric statistical problems. 

The Bernstein - von Mises phenomenon in (6) will often be complemented 
by convergence of moments, that is, convergence of the Bochner integrals (e.g., 
p.100 in [1]) 

f /<ffl„°T- ( i,(/)^ p "o [ fdM(f) = 
as n — > oo in H^ 5 '. This implies that the posterior mean /„ of n n satisfies 

\\f n -X^\\- s , 2 =o P (n- 1 / 2 ), (7) 

so in scmiparametric terminology the posterior mean is asymptotically linear 
in -H^" s with respect to X", so in particular efficient for fo- Therefore, if (7) 
holds, centering Bayesian credible sets at f n instead of X^™^ makes no asymptotic 
difference in H^ 8 '■ 
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1.2. Sobolev Spaces and White Noise 

Before we proceed we need to fix ideas and formally introduce negative order 
Sobolev spaces. Their definition will be given in terms of orthonormal bases of 
L 2 which will also be relevant for the assumptions on the priors we need. 

Denote by (/, g) = f Q f(x)g{x)dx the standard inner product on L 2 . We shall 
work with a general orthonormal basis of L 2 that satisfies the following weak 
regularity condition. While notationally it reflects a wavelet type basis {ijjik '■ 
l> Jo, < k < 2 l — 1} of CDV-type [10], with notational convention ipj k = <Pk, 
it also includes the standard trigonometric basis ipik(x) = e;(x) = e 2lTllx and 
bases of standard Karhuncn-Loeve expansions. 

Definition 2. Let S 6 N. By an S-regular basis {ipik : I E C,k E Z{\ of L 2 

with index sets C C Z, Zi C Z and characteristic sequence ai we shall mean any 
of the following: 

a) ipik = * s S -times differ entiable with all derivatives in L 2 , \Zi\ = 1, ai = 
\l\ A 1, and {e; : I E £} forms an orthonormal basis of L 2 . 

b) ipik is S-times differ entiable with all derivatives in L 2 , C = NU {0}, a; = 
\Zi\ = 2 l , and {ipik ■ I E C, k E Z{\ is an orthonormal basis of L 2 . 

Define for < s < S the Sobolev spaces as 

HI := tf|([0,l]) := If E L 2 ([0,1]) : ||/|£ a :=$>?' £ \&lk,f)? < ool , 
I lec keZi ) 

which are Hilbcrt spaces that may depend on the basis functions used, but 
since for the examples mentioned above this is not the case we suppress it in 
the notation. Moreover 

! ^~] ^ ciktpik : ci k E C, Z[ C Z h C' C C finite > 
lec kez[ J 

forms a dense subset of iff: For fixed £ finite, Z[ and cik = (f, ipik) these sums 
are precisely all the finite-dimensional L 2 -projections 7Ty(/) of / E L 2 onto V, 
where V = Vc,z' = span{ipik : I E CJ \k E Z[}. 
For s > we define the dual space 

H^([0,l}):= (F|[0,1])*. 

Using standard duality arguments (as in Proposition 9.16 in [13]) one shows the 
following: H^ 11 consists precisely of those linear forms L acting on i/f for which 
the ||L||_ Sj 2-norms (defined as above also for negative s) are finite, where now 
(tplk,L) = L(ipik), well defined since S > s, ipik E iff - In fact the so-defined norm 
| ■ || -s.2 is equivalent to the standard operator norm on (iif [0, 1])*. Moreover 
every f E L 2 gives rise to a continuous linear form on iif by using the (-,-) 
duality, so we can view L 2 as a subspace of iff 8 • By reflexivity of iif one 
concludes 

(iT 2 - s ([o,i])r = iif ([o,iD 
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up to isomorphism, that is, any linear continuous map K : H^ 8 — > K is of the 
form K : L H> L(g) for some 17 £ and if L itself is a functional coming from 
integrating against an L 2 -function fa, then L(g) = (ff, /l). 

For any / £ i?| C L 2 (s > 0) and <iW standard white noise we have a random 
linear application 

W:/h> /" /(t)dW(f) ~ iV(0, H/H5). (8) 

JO 

For any s > 1/2, the ||W|j_ s .2-norm is finite almost surely since, by Fubini's 
theorem, for gik independent N(0, 1) variables, 

e\m\- s ,2 = °r 2s E E s? k < 00, 

l k 

so W £ H^ s almost surely, measurable for the cylindrical a- algebra, and by sep- 
arability of H^ 8 also for the Borel <r-algebra (p. 374 in [5]). By Ulam's theorem 
(Theorem 7.1.4 in [11]), W is thus tight in H~ s . The Gaussian variable W has 
mean zero and covariance I diagonal for the L 2 -inner product, 

EW(g)W{h) = (g, h), V g, h £ 

We call the law N of W a standard, or canonical, Gaussian probability measure 
on the Hilbert space H^ s (note that it is the isonormal Gaussian measure for the 
inner product of L 2 but not for the one of H^ 8 )- In the same way the random 
trajectory dX^ n > from (4) defines a tight Gaussian Borel random variable X^") 
on -ff^T s with mean /o and covariance vT x l. 

2. Confidence Sets for Nonparametric Bayes Procedures 
2.1. Weighted L 2 Credible Ellipsoids 

For s > 1/2, denote by 

B(g,r) = {f£H^:\\g-f\\- s ,2<r} 

the norm ball in -ff^ s of radius r centered at g. In terms of an orthonormal basis 
{V'zfc} °f L 2 from Definition 2 this corresponds to L 2 -ellipsoids of the form 

{ Qfc }:E«r 2s |Qfe-(.9,V^)| 2 <r 2 l 

l.k J 

where coefficients in the tail are downweightcd by a ; ~ 2s . A frequentist goodness 
of fit test of a null hypothesis Ho ■ f = fo could for instance be based on 
the test statistic ||/o — X(")||_ Si 2, resembling in nature a Kolmogorov-Smirnov- 
type procedure (as it has power against arbitrary alternatives /). In directional 
statistics these are sometimes called Sobolev statistics/tests, see [19, 24]. 
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A Bayesian approach consists in using the quantiles of the posterior directly, 
that is to solve for M„ = M(X^ n \a) such that 

n(/ : ||/ - X<">||_ a , a < A/„/V^|X (,l) ) = 1 - a (9) 

where < a < 1 is some fixed significance level. The resulting credible set 

C„ = {/ : ||/ - X {n) \\- S ,2 < M n /V^} (10) 

is a random || • ||_ s .2-ball. The weak Bernstein-von Mises phenomenon in i?^ s 
implies that this credible ball asymptotically coincides with the exact (1 — a)- 
confidence set built using the efficient estimator X^™) for /o, in particular the 
width of the credible ball is of order Op(n -1 / 2 ), and M n converges to a finite 
constant in probability. 

Theorem 1. Suppose the weak Bernstein-von Mises phenomenon in the sense 
of Definition 1 holds. Let C n be the credible region from (10) with M n chosen 
as in (9). Then 

p; o (/ eC„)^i-Q 

as n — > oo. If in addition (7) holds then the same result holds true if C n is 
centered at the posterior mean /„ of n(-|X( n '). 

One may wish to intersect C n further with the support of n(-|X(™)). Theorem 
1 holds for such credible sets as well as long as /o is in the i?^ s -support of 
n(-|X(™)) (which in all natural situations will be the case). 

2.2. Credible bands for Fourier coefficients 

Suppose one wants to recover the Fourier coefficients 

/(to) = \ e 27Iimt f{t)dt, to e N, \m\ < N, 
Jo 

up to some fixed frequency N e N, and is interested in inference on all the 
{/(m)}| m |<jv simultaneously. It is a 'discrete' version of the problem studied in 
the sampling setting in [12], where several applications can be found. Let 

£^={/:[-iV,JV]nZ^C:||/|| 00 ,iv:= max |/(to)| < oo} 

\m\<N 

be the space of bounded complex- valued functions on the integers m, \m\ < N. 
The Fourier transform T : f M> {/(TO)}| m |<jy maps L 2 into 
Given the empirical Fourier coefficients 

<f> n (m)= [ e 27Tlmt dX {n \t) 7 togZ, 
Jo 
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whose restriction to the integers m, |m| < N, is an element of £^ almost surely, 
one solves for M n in 

n„ o J 7-1 I 5 : max \g(m) - <j> n {m)\ < M n /y/n) = 1 - a (11) 
V \m\<N J 

where II„ = n(-|X(™)) is the posterior, and constructs a credible band 

C„ = \ g : max \g(m) - 4>„{m)\ < M„/\/n \ (12) 
I \m\<N J 

of functions g : [— N, N] n Z — > C. Again, the proof of the following theorem 
establishes that M n converges in probability to a finite constant. 

Theorem 2. Suppose the weak Bernstein-von Mises phenomenon in the sense 
of Definition 1 holds. Let C n be the credible band from (12) with M n chosen as 
in ( 11 ). Then 

Pf ({/M} h <jv G C n ) ->l-a 

as n — > oo. If in addition (7) holds then the same result holds true if C' n is 
centered at J-f n where f n is the posterior mean of H(-\X^ n ' 1 ). 



2.3. Credible bands for self -convolutions 

Suppose now we are interested in estimating the function 

/o*/o= I fo(--t)f (t)dt 
Jo 

where addition is mod-1 (so the convolution of /o with itself on the unit circle). 
The related problem in density estimation was studied in the papers [14, 20, 
29, 30, 32, 33], where it is shown that /o * fo can be estimated at the l/^Jn- 
rate even when this is impossible for /o. See particularly [14] for applications. 
Assume fo is one-periodic and contained in iJ| for some s > 1/2, and that 
the posterior is supported in L 2 ([0, 1)) = Lp er ([0, 1)) which, in this subsection, 
denotes the subspace of I? consisting of one-periodic functions. We will assume 
that the basis used to define -fff is such that (J^ m |/(m)| 2 (l + Iml) 25 ) 1 / 2 is an 
equivalent norm on iJ| (which is the case for CDV- or periodised wavelets and 
trigonometric bases of L 2 ). 

By standard properties of convolutions «:/(->•/*/ maps L 2 ([0,1)) into 
C([0, 1)), the space of bounded continuous periodic functions on [0, 1) equipped 
with the uniform norm || • Hoc. If n„ = n(-|X^) with posterior mean /„ <E 
L 2 ([0, 1)), we can construct a confidence band for fo * fo by solving for M n such 
that 

H„ o k~ ^ {g :\\g-f n * co < M n /Vn) = 1 - a (13) 
with resulting credible band 



Cn = {g ■■ \\g - fn * /™||oo < M n /y/n}. 



(14) 
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Theorem 3. Suppose the weak Bernstein-von Mises phenomenon in the sense 
of Definition 1 holds, and that fo € H$- Assume moreover (7) and that for some 
sequence r n = o(nr x / 2 ), 

Win - h\\l = P (r n ), n„(/ : ||/ - fo\\l > r n ) = o P (l). 

Let C n be the credible band from (14) with M n chosen as in (13). Then 

Pf (fo*foeC n )^l-a 

as n — > oo. 

If /o is s-H61der on [0, 1) for some s > 1/2 then the priors from Condition 1 
with o~i and 7 = s chosen as in Remark 1 are admissible in the above theorem 
with r n = n~ 2s '( 2s+1 > , c f Corollaries 1, 2 below and the results in Subsection 
3.4. Note that any choice s > 1/2 will do to obtain order l/y^-width credible 
bands. 

2-4- Plug-in credible sets for functionals 

If the goal of statistical inference is a possibly nonlinear real- valued functional 
\l/(/o) of fo one can use the posterior IT(- 1 X^")) in a natural way to construct 
'plug-in' credible sets. 

2.4.1. Linear functionals 

Let L be any linear form on L 2 given by 

L{f) = (f,g L ) = [ f(t) 9L (t)dt, f e L 2 , 

Jo 

where gi, G s > 1/2, and g^ 7^ 0. If II„ = ll(-|X^ n ') is the posterior one may 
construct credible sets for L(f ) based on the induced law LT^ = II rl o in 
several ways: For example one solves for M n = M(K^ n \L 7 a) in 

Il£(z : \z - L(XW)| < M n /Vn) = l-a (15) 

which gives rise to the credible set 

C n = [z:\z- L(XW)| < M n /V^} (16) 

for L(fo). An alternative way to build the credible set is discussed below in a 
more general setting. 

Theorem 4. Suppose the weak Bernstein-von Mises phenomenon in the sense 
of Definition 1 holds. Let L = (•,S'z,) be a linear functional on L 2 where 7^ 
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<7l £ iff,s > 1/2. Let /3r be the bounded- Lip schitz metric for weak convergence 
on R, and define #t : x i— > y/n(x — t) for t,i6l. TTien 

/3 R (n^ o^ (B)) ,JV(o,|| fli ||l)) -+ p "o o. 

Moreover let C n be the credible region from (16) with M n chosen as in (15). 
Then 

P? (L(f ) e C n ) ^ I - a 

as n — > oo. // in addition (7) holds then the same result holds true if C n is 
centered at L(f n ) where f n is the posterior mean o/n(-|X'"^). 

The proofs imply that M n converges to a finite positive constant in probabil- 
ity, and the credible set C n thus has frequentist length of order 1/y/n. Moreover, 
the 'plug-in' posterior II n oL _1 has the approximate shape of a normal distribu- 
tion centered at the efficient estimator L(X(")) of L(/o) with variance HsiHi/^- 
This implies in particular that the width of the credible set C n is asymptotically 
efficient from the semiparametric perspective; in fact ||<7l||2 is the semiparamet- 
ric Cramer-Rao bound for estimating L(fo) from observations in the Gaussian 
white noise model. 

The fact that any integral functional J Q f(t)gL(t)dt, gi £ Lff,s > 1/2, is 
covered gives rise to a rich class of examples. For instance the functionals 

t a f(t)dt, f \t\ a f(t)dt, a £ N, 
Jo 

are covered, so nonparametric Bayes posteriors can be used with good confidence 
for inference on moment type functionals. The assumption s > 1/2 is intrinsic 
to our methods and cannot be relaxed. 



2.4-2. Smooth nonlinear functionals 

We next consider statistical inference for nonlinear functionals of fo that satisfy 
a good quadratic approximation in L 2 at fo, more precisely, we assume that 
* : L 2 -> K satisfies, 

*(/o + h)- *(/„) = £>*/b [h] + 0{\\h\\l), (17) 

uniformly in h £ L 2 and for some f : L 2 — > K linear and continuous that 
has a (nonzero) L 2 -Riesz representer ^ff £ ff| for some s > 1/2. This setting 
includes several standard examples discussed in more detail at the end of this 
section, but also the linear functionals discussed above. 

Note that now \I> cannot necessarily be evaluated at X^™) (think of x f'(/) = 
|| /|| 2). However, since the posterior is supported in L 2 with probability one, the 
following Baycsian credible set can be constructed for #(/ ): For IT„ = n(-|X(")) 
the posterior distribution, set II* = II n o and solve for reals n n: v n such 
that 

n*((-0O )M n]) = l£((l/«,+Oo)) = |, (18) 
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that is, fj. n , v n are the a/2 and 1 — a/2 quantiles of IT*. Set 

C n = C n (X ( - n \a) = ( f i n ,u n }. (19) 

Theorem 5. Suppose the weak Bernstein-von Mises phenomenon in the sense 
of Definition 1 holds. Consider a functional "J such that (17) is satisfied. Assume 
moreover either that ^ is linear or that for some sequence r n — o(n~ 1 ' 2 ), 

n„(/: ||/-/o||l>r„) = op(l). 

Let C' n be the credible set from (19) with /Lt n ,i/„ chosen as in (18). Then, as 
n — > oo, 

Wo)eC n )-H-a. 

Similar to the previous result, the shape of the 'plug- in' posterior IT n o vp -1 
is approximately Gaussian, this time centered at ^(fo) + {^} a /^/n, W), and 
with variance ||^/„lli/ n - More precisely, for /3 R the bounded-Lipschitz metric 
for weak convergence, 

In fact, as follows from the proof of Theorem 5, the random quantiles [i n ,Vn 
admit the expansion, 

/'„ 

with the distribution function of a iV(0, ||^/ || 2 ) variable. Again, H^/JI 2 . is 
the scmiparamctric efficiency bound for estimating ^(/o) in the Gaussian white 
noise model, which shows that the asymptotic width of the credible set C n for 
^K/o) is optimal in the semiparametric sense. 

If /o is 7- Holder for some 7 > 1/2 then the priors from Condition 1 with 
o~i chosen as in Remark 1 are admissible in the above theorem with r n = 
ra - 2 7/( 2 7+ 1 ) ; c f. Corollaries 1, 2 and Subsection 3.4 below. 

Examples include the standard quadratic functionals such as "J (/) = J f 2 (t)dt 
or composite functionals of the form ^f(f) = J 4>(f(x),x)dx and the like. Some 
functionals may necessitate some straightforward modifications of our proofs: 
For instance / \ f(t)\ p dt requires differentiation on L p instead of L 2 , and for the 
entropy functional J f(t) log f{t)dt one assumes fo > ( > on [0, 1] and differ- 
entiates \1/ on L°° . In these situations, to control remainder terms, one may use 
contraction results in L p ,2 < p < 00, instead of L 2 , such as the ones in [23]. 
Our assumption 7 > 1/2 falls short of the critical assumption 7 > 1/4 necessary 
for 1/ y^n-cstimability of some of these functionals, a phenomenon intrinsic to 
general plug-in procedures. 



*(/o) + 4=(*/o. W) + ®* ^ + op(1/V^) 



*(/ ) + -U* /o ,w) + — ( * 2) +o P (l/Vn), 
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3. Bernstein-von Mises Theorems in White Noise 

We now develop general tools that allow to prove that priors satisfy the Bernstein- 
von Mises phenomenon in the sense of Definition 1, and show how they can be 
successfully applied to a wide variety of natural classes of product priors. 

For / £ L 2 consider again observing a random trajectory in the white noise 
model (4). Given an orthonormal basis from Definition 2, the white noise model 
is equivalent to observing the action of X^™) on the basis, i.e., 

x£ 5 =e lk + -Lei k , keZi,leC, 

where 6ik = (/, 4>ik), £ik N(0,1). Let II be a prior Borel probability dis- 

tribution on L 2 which induces a prior, also denoted by II, on infinite sequences 
{Oik} 6 I 2 . Let n(-|X(">) be the posterior distribution and let n(0 ife |X(")) de- 
note the marginal posterior on the coordinate Oik- 

3.1. Contraction Results in -H^ s 

In this subsection we consider priors of the form II = ®ik^ik defined on the 
coordinates of the orthonormal basis {"0ifc}j where 717^ are probability distribu- 
tions with Lebesgue density tpi k on the real line. Further assume, for some fixed 
density tp on the real line, 

flk(-) = — <p(— ) Vfc e Zi, with <ji > 0. 
07 V °~i J 

Condition 1. Suppose that there exists a finite constant M > s.t. 

(PI) sup \hA< M , 6 = {6 0tlk } = {(f ,i Hk )} 

Suppose also that tp is s.t. there exists r > M and < c v < C v < oo with 

(P2) <p(x) < C v Vx € R, ip(x) > c v Vie(-T,r), / x 2 (p(x)dx < oo. 

Jr 

Some discussion of this condition is in order: We allow for a rich variety of base 
priors tp, such as Gaussian, sub-Gaussian, Laplace, most Student laws, or more 
generally any law with positive continuous density and finite second moment, 
but also uniform priors with large enough support. The full prior on / considered 
here is thus a sum of independent terms over the basis {ipik}, including many, 
especially non-Gaussian, processes. For Gaussian processes Condition 1 applies 
simply by verifying that the L 2 -basis provided by the Karhunen-Loeve expansion 
of the process satisfies the conditions of Definition 2. This includes in particular 
Brownian motion: The corresponding tp is then the standard Gaussian density 
and 07 = l/(7i"(Z + ^)) are the square-roots of the eigenvalues of the operator. 
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Through condition (PI), this allows for signals fo = (8o,lk) whose coefficients 
on the basis decrease at least as fast as l/l. For primitives of Brownian motion 
similar remarks apply, with stronger but natural decay restrictions on (fo,tpik)- 
In principle making the prior rougher allows for more signals through Condi- 
tion (PI), but this may harm the performance of the posterior in stronger loss 
functions than the one considered in the next theorem. 

Theorem 6. Consider data generated from equation (4) under a fixed function 
fo with coefficients 9o = {&o,ik} = {(fo, ipik}}, and denote by P^ g the distribution 
J X^") . Then if the product prior II and fo satisfy Condition 1 we have for every 
s > 1/2 that, as n — > oo, 



P%j \\f-M\ 2 - s , 2 dii(f\xW) = o(± 



Proof. We write X = X^™) and E = Pj? throughout the proof to ease notation. 
We also decompose the indexing set C into 

J n := {I G C, \fno~i > S } 

and its complement, where So is a fixed positive constant. The quantity we 
wish to bound equals, by definition of the negative Sobolev norm and Fubini's 
theorem 



af s Pf a f (0, fc - e ,ik) 2 dn(e lk I X). 

( 7, *f 



l,k 

Define further 

B ik (X) := J (9 lk - eo,ik) 2 dn(9 lk | X) 

whose PJ o -expectation we now bound. 

Using the independence structure of the prior we have H(9i k | X) = 11(9 ik | X/fc), 
and under P? , 

Jo ' 

B (x] _ Wlk - ^. ifc ) 2 e'^ (e ' fc ' e °'' fc)2+V?r£ ' fc(e ' fc ' eo -' fc Vi fc (^ fc )^ fc 
lk[ ' J e -f( e ^- e o, lfc ) 2 +^A^e ifc (e !fc -eo, ^fc ) Wfe ( 6 , ^fc ) c^6 | ^fc 

_l l^ + ^yk^{^^) d - _ lN lk 

Consider first indices in J£. Taking a smaller domain of integration in the 
denominator makes the integral smaller 

Dik(eik)> e i^™ v ——ip — : \dv. 

J-y/na, V n °~l \ °l ) 

To simplify the notation we suppose that r > M + 1. If this is not the case, 
one multiplies the bounds of the integral in the last display by a small enough 
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constant. The argument of the function ip in the previous display stays in [— M+ 
l,M + 1] under (PI). Under assumption (P2) this implies that the value of 
tp in the last expression is bounded from below by c v . Next applying Jensen's 
inequality with the logarithm function 



logDi k (s lk ) > log(2c v ) - 



dv 



£lk 



dv 
v——, = — 



= log(2c v ) - (Vn(r;) 2 /6. 



Thus, Dik(eik) > 2c v e~' v/ ™ cr! ' 2 / 6 , which is bounded away from zero for indices 
in J^. Now about the numerator, let us split the integral defining Nik into two 
parts {v : \v\ < y/nai} and {v : \v\ > y/nai}. That is Ni k (eik) = (I) + {!!)■ 
Taking the expectation of the first term and using Fubini's theorem, 



E(I) = 



1 — — 

v e 



< 2na?C v /3. 



'-E[e eikV \- 



inoi 



'OUk 



n l ' 2 v 



dv 



The expectation of the second term is bounded by first applying Fubini's theo- 
rem as before and then changing variables back 



E{II) 



v e 



\v\>^n<7i 



T£[e £ ""]- 



1 



inaiu — y/noi- 



inai 



70, Ik 



6o 



Ik 



"1/2, 



tp(u)d% 



dv 



< 2nof 



9 2 

7 Q,lk 



ina\u — ■ 



nor 



7Q,lk 

<Jl 



u 2 (p(u)du 



ip(u)du 



Thus, using (PI) again, E(I) + E(II) is bounded on J£ by a fixed constant 
times naf . In particular, there exists a fixed constant independent of n, k, I such 
that E(nBik(X)) is bounded from above by a constant on 

Now about the indices in J n . For such /, k, using (P1)-(P2) one can find 
Lq > depending only on So, M, r such that, for any v in (— Lq, Lq), 



9 



> Cn 



Thus the denominator Dik(eik) can be bounded from below by 



Dik{sik) > c v 



'nai 



-dv. 
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On the other hand, the numerator can be bounded above by 



Ni k (e lk ) < G v \ v 2 e- v ~ + ^ v 



1 



dv. 



Putting these two bounds together leads to 

C v J v 2 e-^ +E ' kV dv 



Bik{sik) < 



The last quantity has a distribution independent of l,k. Let us thus show that 

f L " e-i'—> 2 dv 

is finite for every Lq > 0, where e ~ N(0, 1). In the numerator we substitute 
u = v — e. Using the inequality (u + eik) 2 < 2v 2 + 2ef k , the second moment of 
a standard normal variable appears, and this leads to the bound 

1 + e 2 

Q(L ) < CE 



rL e .i [v _ e) 2 dv 



for some finite constant C > 0. Denote by g the density of a standard normal 
variable, by $ its distribution function and $ = 1 — $. It is enough to prove 
that the following quantity is finite 



q(L ) 



+00 



(1 + u 2 )g{u) 



-du = 2 



+00 



(l + u 2 )g(u) 



$(u-L )-$(u + L ) 



$(w - L Q ) - $(u + L Q ) 
since the integrand is an even function. Using the standard inequalities 

D -u 2 /2 < U > 1 



-du, 



/2tt 1 + u 2 u 



2ir u 



it follows that for any 6 > 0, one can find M$ > such that, for any u > Ms, it 
holds 

(1 - 8)-e- u2 / 2 < V2^$(u) < -e~ u2 / 2 , u > M s . 
u u 

Set A s = 2L V M s . Then for 8 < 1 - er 2L ° we deduce 



q(L ) < 2 



(l + u 2 )g(u) 



-du 



< C{A S ,L 



®(A S - L ) - <f>(A s + L ) 

(u-L )(l + u 

As 1-5 -e 

2e -L 2 /2 r+oo 



2V2vr 



-2L 



1-5 



-2L 



u(l + u 2 )e~ LoU du < +oo. 



Conclude that sup ifc PJ o |Bjfc(X)| = 0(l/n). Since J2ik a i 2s < 00 the result 



follows. 



□ 
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Theorem 7. With the notation of Theorem 6, suppose the product prior II and 
/o satisfy Condition 1. Then 

P% J \\f-f \\ 2 2 dIL(f\xM) = O (j2(af An- 1 )) . 

Proof. With the notation used in the proof of Theorem 6, using Fubini's Theo- 
rem, 

Pf f \\f-fo\\ 2 2 dU(f | XW) = Y, P lf (dik-0o,ik) 2 dn(6 lk \X)=J2 P%B lk (X). 

J Lk J l,k 

In the proof of Theorem 6, the following two bounds have been obtained, with 
the notation J n := {I € C, \fnoi > So}, 

sup Pf B lk (X) = O (n" 1 ) 
sup P? B lk (X) = O (a?) . 

For any / E J°, by definition of J n it holds af < S^ 1 , thus a? < (1V5q)(ctj 2 A 
n- 1 ). Similarly, if I G J"„ we have n" 1 < (1 V 2 )(of A n" 1 ). □ 



Corollary 1. .S'ei ^ = |Z|" 



(Ti = 2 (2+t)' depending on the chosen S- 



regular basis of type either a) or b). Suppose that the conditions of Theorem 7 
are satisfied. Then 



PI J \\f-M\ldH(f\x^) = o 



2-1 

n 2 t+ 1 



Proof. For both types of basis Y^i I z i I (°f A n = 0{n 2 -y+ 1 ). 



□ 



Remark 1 . The previous choice of o~i entails a regularity condition of fo through 
Condition (PI), namely sup fc \0o,ik\ < MoiAioi = 2~^ + ^ 1 this amounts to the 
standard Holdcrian condition if one uses a CDV wavelet basis, or a periodiscd 
wavelet basis - any /o in the Besov space -B^ooo satisfies (PI) for such bases. 
For other bases similar remarks apply. 

Corollary 2. Denote by /„ := /„(X( n ') := / /dil(/ | X( n )) the posterior mean 
associated to the posterior distribution. Under the conditions of Theorem 1, 



Pf \\fn-M\i = O(J2(afAn- 1 )). 



i.k 



Proof. The Cauchy-Schwarz inequality implies 



PfoWfn- fo\\l ~ PfoYl 
l,k 



(8 lk -9 o , lk )dTl(0 lk \X) 



Lk 



71 k 



%,ik) 2 dtt(0 lk 



and one can apply Theorem 7. 



□ 
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3.2. Convergence of Finite Dimensional Distributions 

Consider again the posterior distribution LT„ = II(- 1 X' n ') on L 2 from the be- 
ginning of this section (not necessarily arising from a product measure). Let 
V be any of the finite-dimensional projection subspaces of L 2 defined in Sec- 
tion f .2, equipped with the L 2 -norm, and recall that try denotes the orthogonal 
projection onto V in L 2 . For z £ define the transformation 

T z = T z y : f h-> Vn n v (f - z) 

from H^ 8 to V, and consider the image measure H n oT~ . The finite-dimensional 
space V carries a natural Lcbesgue product measure on it. 

Condition 2. Suppose that Hony 1 has a Lebesgue- density dHy in a neighbor- 
hood of irv(fo) that is continuous and positive at nv(fo). Suppose also that for 
every 5 > there exists a fixed L 2 -norm ball C = Cs in V such that, for n large 
enough, 

Pl{U n oT^){C c )<5. 

This condition requires that the projected prior has a continuous density 
at 7ry(/o) and that the image of the posterior distribution under the finite- 
dimensional projection onto V concentrates on a l/-y/n- neighbor hood of 7Ty(/o)- 
It thus ensures the classical finite-dimensional assumptions required for a (local) 
Bernstein- von Mises theorem in the space V, and the following theorem is proved 
in a way similar to the classical parametric proof (Chapter 10 in [35]) due to Le 
Cam [27]. 

Denote by || ■ \\tv the total variation norm on the space of finite signed 
measures on V, and let N(0,I) be a standard Gaussian measure on V. 

Theorem 8. Consider data generated from equation (4) under a fixed function 
fo and denote by PJ Q the distribution of X*-™-' . Assume Condition 2. Then we 
have, as n — > oo, 

IIIInoT-^ -N(0,I)\\ TV ^ P/ "« 0. 

Proof. If Wv = 7Tv(W), a standard Gaussian variable on V, and if Il n y = 
II n o TJ^, it suffices to prove that ||n n ,V — N(Wy , I)\\tv converges to zero in 
PJ o -probability. In the following, denote by A the Lebesgue measure on V and 
by Ac its restriction to a measurable set C. 

Define LT^y, the posterior distribution fl n y based on the prior restricted to 
a measurable set C and renormalised, that is, for B a Borel subset of V, 

n ' V[ 1 /e-NI 2 /2+< S ,^> d nC (ff) 

where fiy = IIo T^ v and where fi c (B) = u(B n C)/n(C) for any probability 
measure /i. A simple computation shows 

P;jn n , v - fi£ v ||rv < 2Pltl n y{C c ) < 25, 
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using the hypothesis of the theorem, and likewise, if N (Wv, I) is the restricted 
and renormalised normal distribution, ||JV(Wv, /) — N c (Wv,I)\\tv < S, for 
every S > and for C = Cs a ball of large enough radius. It thus suffices to 
prove 

Pf \\flZv-N c (W Vl I)\\ TV <S 

for every S > and n large enough. The total variation distance ||n^V — 
N c (Wv, I)\\tv is bounded by twice 




dN c (W v ,I)(h) 



l c e-\\H 2 /2+(h,w v ) dUv ( h )/ f c e-\\g\\ 2 /2+(a-W v ) dUv ^ 



e -\\ a \\ 2 /2+{g,w v ) d fi v ( g ) dN c ( Wy j)( h ) 

e -||/ l ||V2+<' l ,Wv> rf n l/ (/ l ) d 7v c (Vr y ,J)( ff ) j 
+ 



dN c (W v ,I)(g)dU^ v (h) 



where we used (1 — PY~) + < P(l — Y~) + in the first inequality and where the 
constant c = c(Wy) in the previous display is an upper bound for the density 
of N c (Wy , I)(g) with respect to Ac- This constant is random but bounded in 
PJ o -probability since Wy is tight. 

Now note that the last display is random through Wy only. So, considering 
convergence to zero under PJ Q amounts to considering convergence to zero under 
the marginal distribution Pj? v on the subspace V. Under PJ o v , the variable 
Wy has law N(Q,I). We have to take the expectation of the display with re- 
spect to this law, that we denote by Pw v • That is, dPw v has Lebesgue-density 
proportional to e~^ w W / 2 d\(w) on V. 

Define, for c(V) a normalising constant, 

dPS(w) = c(V) (j e-ll*-™ll a / 2 dft£(fc)^ dX(w) (20) 
= (/ e-W 2 ^ k ^dti c v {k)^dP Wv {w), 

a probability measure with respect to which dPw v is contiguous, see Lemma 
1 below, so that it suffices to show convergence to zero under dP^j instead of 
dP\v v ■ The P^-expectation of the quantity in the last but one display equals 
the expectation of the integrand under 

dfl^ v (h)dPS(w)dXc(g) = e-\\ h - w \?> 2 dwdt§{h)d\ c {g), 
the latter identity following from Fubini's theorem and 



e 



lc J e -^+( m ^dflC(m) 



dU v (k)e ~dw = e = dwdU v (h) 
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which is less than or equal to, for n large enough and using that cflly is contin- 
uous at and thus bounded near 7ry(/o), 

dU v (h) J 

which converges to zero by dominated convergence and continuity of dHy at 

7ry(/o). D 

Lemma 1. The probability measure Pw v * s contiguous with respect to the prob- 
ability measure Pq defined by (20). 

Proof. Suppose Pg(A n ) 0, for a sequence of measurable sets A n . This implies 



inf e -ll fc ll 2 /2+(fc/ 

feGC 



dPw v (w) -> 0. 



Since C is compact, the infimum of the continuous function in the display is 
attained for some fixed 7 in C. Thus 

e -||7|| 2 /2+<7,«>} e -|M| 2 /2 rfA ( w ) = f e -\h-v>W 2 /2<i\( w ) _> 0. 

Since N(^,I) and N(0,I) are mutually contiguous (by, e.g., Le Cam's first 
lemma, see [35], Chapter 6), the result follows. □ 

Remark 2. Alternatively, in the case of product priors, one can apply Theorem 
1 in [8]. By independence of the Gaussian coordinate experiments (ipik,^^) = 
&o,lk + ~^ e lk-> when estimating one or more generally any finite number of the 
0o,ifc's, there is no loss of information with respect to the case where all other 
#0,/fc's would be known. Since the model is exactly LAN, condition (N) in [8] is 
satisfied with a zero remainder and condition (C) in [8] amounts to asking that 
the full posterior concentrate at some rate e n — > in the L 2 -norm (which for 
product priors is implied by Corollary 1). 



3.3. A BvM-theorem in H 2 s 

Let n n = II(-|X( n >) be the poster ior distribution on L 2 . Under the following 
Condition 3, which depends on a positive real r > to be specified in the 
sequel, we will prove that a weak Bernstein-von Miscs phenomenon holds true 
in ■ For the product priors considered above we will then verify Condition 
3 below. 
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Condition 3. Suppose for every e > there exists a constant < M = M(e) < 
oo independent of n such that, for any n>l, some r > 1/2, 



p?n 

JO 



/: ll/-/o||ir,2> 



. M I l X (n) 



< e. 



(21) 



Assume moreover that the conclusion of Theorem 8 holds true for every V (i.e., 
the finite- dimensional distributions converge). 

On H^ 8 and for z G H^ 6 , define the measurable map 

t z : f i-» y/ri(f - z). 

Recalling the definitions from Section 1.2, consider n n or~ ( ^) , a Borel probability 
measure on P^~ s . Let Af be a standard Gaussian measure on H^ 8 ' ■ 

Theorem 9. Fix s > r > 1/2 and assume Condition 3 for such r. If j3 is the 
bounded- Lip schitz metric for weak convergence of probability measures on H^ s 
then, as n — > oo, 

Proof. It is enough to show that for every e > there exists TV = iV(e) large 
enough such that for all n > N, 

P; (/3(n„or- ( i ) ,^)>4 £ )<4e ! 

Fix e > and let Vj be the finite-dimensional subspace of P^ r spanned by 
{ipik ■ k e Zi,l e £, |/| < J}, for some integer J > 1. Writing n„ for n„ o r~ (n) 
we see from the triangle inequality 

/?(n„,A0 < /3(n„,n„ o n^) + [3(fi n o ^,^0^) + /?(AAo tt^SjV). 

The middle term converges to zero in Pj? -probability for every Vj, by conver- 
gence of the finite-dimensional distributions (Condition 3 and since the total 
variation distance dominates j3). Next we handle the first term. Set Q = M = 
M(e 2 /4) and consider the random subset D of H^" defined as 



D = {g: \\g + - 



,2 < Ql- 



Under P"we have U n {D) = U„(D n ), where D n = {/ : ||/ - f a \\ 2 _ r2 < Q/n} 
is the complement of the set appearing in (21). In particular, using Condition 3 
and Markov's inequality yields Pf o (tl n (D c ) > e/4) < e 2 /e = e. 

If Y n ~ n n (conditional on X^ n '), then nvj(Y n ) ~ Ii n o ny\ For P any 
bounded function on H^ 3 °f Lipschitz-norm less than one 



PdlF 



Pd(fl„ O 7T 



|^n„ [F(Y n ) ~ F(n Vj (Y n ))}\ 



< p fin [||y„ -7r Vj (y„)||_ s , 2 i D (y n )] + 2fi„(p c ), 
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where denotes expectation under n n (given X^). With y lk = (Y n ,^pi k ), 



Eu n [\\Y n - n Vj (Y n )f_ S!2 l D (Y n )] = E An 

lt>J k 

< af^E^ [\\Y n \\\ 2 l D (Y n )] < 2af- s) [Q + \\W\\^ 2 ] . 
From the definition of f3 one deduces that for large enough J, 

/3(n„,n„ o k-]) < e + 2U n (D c ) + V2a7 s ||W||_ 7 , 2 . 

Conclude that P£ (£(&„, f[ n o tt^ 1 ) > 2e) < 2e for J large enough, combining 

the previous deviation bound for Ti n {D c ) and that ||W||_ r ,2 is bounded in prob- 
ability, which follows from after (8) above. A similar (though simpler) argument 
leads to 

Pf o (P{N^v]M) >e) < e , 

using again that any random variable with law N has square intcgrablc Hilbcrt- 
norm on H 2 r . This concludes the proof. □ 

3-4- The BvM-theorem for Product Priors 

Combining the contraction result Theorem 6 with Theorems 8 and 9 we now 
provide a rich class of nonparametric product priors for which the weak Bern- 
stein - von Mises theorem in the sense of Definition 1 holds. Specific choices 
of <ji allow in particular to obtain Bickel and Ritov's plug-in property for the 
posterior, relevant in Theorems 3 and 5. 

Theorem 10. Suppose the assumptions of Theorem 6 are satisfied and that ip 
is continuous in a neighborhood o/{#o,ifc} for every k € Zi,l G C. Let s > 1/2. 
Then for (3 the bounded Lipschitz metric for weak convergence of probability 
measures on H^ 8 we have, as n — > oo, 

/3(U n or-^,Ar) ^ p fo 0. 

Proof. We only need to verify Condition 3 with some 1/2 < r < s so that we 
can apply Theorem 9. From Theorem 6 with any such r in place of s, we see 
that 

nPl J ||/ - / || 2 _ r . 2 dn(/|X(")) = 0(1), (22) 

which verifies the first part of Condition 3 for some M large enough using 
Markov's inequality. The second part follows from verifying Condition 2 to in- 
voke Theorem 8: Let V be arbitrary. If Vj is defined as in the proof of Theorem 
9, and if J is the smallest integer such that V C Vj, then 



WMf - /o)III < hvAf - h)\\l < afWf /o|| 2 - r ,2 
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so that the second part of Condition 2 follows from the estimate (22) and 
Markov's inequality, for C a fixed norm ball of squared diameter of order a 2r M 2 . 
The first part of Condition 2 follows from the fact that IT o TJ is a product 
measure in V with bounded marginals tpik constant in k, and from the continuity 
assumption on ip. □ 

Theorem 11. Suppose the assumptions of Theorem 6 are satisfied and that ip 
is continuous in a neighborhood of {9o,ik} for every k € Zi,l € C. Let s > 1/2, 
let Y n be a random variable drawn from H n o tJ w ( conditional on X*-") ), and 
let f n be the (Bochner-) mean of the posterior distribution I±(-|X( n '). Then 

E[Y n | X«] = v/n(/„ - X<»)) 

in H^ s as n — > oo. 
Proof. Note that 

E[\\Y n \\\ 2 \H^] = J \\h\W 2 dIi n or-Uh) 

< 2n[\\f- f \\ 2 _ sa dU(f\X^) + 2\\W\\ 2 _ sa = Op Fo (1) 

by Theorem 6 and since ||W||_ S ,2 < oo almost surely, as after (8). Moreover 
Y n —> N weakly in H^ 11 in Pj? -probability, where N ~ Af, by Theorem 10. By 
a standard uniform intcgrability argument (using that {Y n : n € N} has H^ 11 - 
norms with uniformly bounded second moments and converges to N weakly), 
and arguing as in the last paragraph of Section 4.2 below, we conclude 

E[Y n | X (n) ] -> EN 

in H 2 S in PJ o -probability, which implies the result since EN = 0. □ 

4. Remaining Proofs 
4-1- Proofs for Section 2 

Theorem 1. By Corollary 6.8.5 in [5] the image measure Afo (|| • 1 1 « , s ) 1 of Af 

under the norm mapping is absolutely continuous on [0, oo), so the mapping 

$ : t m. Af(B{0,t)) = Af(WJj) =Afo (|j • || _ fl , 2 ) _1 ([0, *]) 

is uniformly continuous and increasing on [0, oo). In fact, the mapping is strictly 
increasing on [0, oo): using the results on p. 213-214 in [37], is suffices to show 
that any shell {/ : s < ||/||- s ,2 < t}, s < t, contains an element of the RKHS L 2 
of Af, which is obvious as L 2 is dense in H^ 11 . Thus $ has a continuous inverse 
$~ 1 : [0,1) — » [0,oo). Since $ is uniformly continuous for every e > there 
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exists S > small enough such that + S) — $(i)| < e for every £ G [0, oo). 
Now 

N(d s B(0,t))=Af(B(0,t + 5))-N(B(0,t-5)) = |*(t + <J) - $(£-<5)| < 2e 

for (5 > small enough, independently of i. Using (25) below we deduce that 
the balls {-B(0, t)}o<t<oo form a A/-uniformity class, and we can thus conclude 
from Definition 1 and the results in Section 4.2 below that 



sup 

0<t<oo 



IL(f:\\f-XM\\- s , 2 <t/V^\^ n) )-M(B(0,t)) 







in PJ Q -probability, as n — ¥ oo. This combined with (9) gives 

N(B(0,M n ))=Af(B(Q,M n )) - U(f : \\f - X< n >||_,, 2 < M n /V^\^ (n) ) + I - a, 

which converges to 1 — a as n — > oo in PJ Q -probability, and thus, by the contin- 
uous mapping theorem, 

M n -> p /o $-!(l - a ) (23) 
as n — > oo. Now using this last convergence in probability, 

Pf (fo&C n ) = PZ(f eB(X( n \M n /V^) 
= Pf o (0eB(W,M n )) 

= py {oeB(yr,$- 1 o.-a))) + o(i) 

= AA(S(0,$" 1 (l-a))+o(l) 

= $($- 1 (l-a))+o(l) = l-a + o(l) 

which completes the proof of the first claim. The second claim follows from the 
same arguments combined with (7) which implies that 

Pf (fo € B(f n ,M n /V^)) - P? a (fo E B(XW,M n /VS)) -> 
in Pf o -probability, as n — > oo. □ 

Theorem 2. The mapping J 7 : / i-> / = (/, e 2Tlm ') | m |<jv. m ez is linear and con- 
tinuous from f/^ 8 -> (."^ in view of ||/||oo,iV < C||/||-s,2 for C = max m ||e 27r4m ' j| s , 2 < 
oo. Therefore, by Definition 1 and the continuous mapping theorem we have (as 
in Section 4.2 below), as n — > oo, 

^((iinor-^or^^cr 1 ) -> p *> o 

where /3 is the bounded Lipschitz metric for weak convergence in i^. If we 
define W{u) = ft e 27rmt dW{t) lU e Z, then the {W(u)}\ u \< N are i.i.d. N(0, 1) 
so that W € ^ almost surely. Moreover max|„| <A r |iy(u)|, as a finite maximum 
of i.i.d. Gaussians, has an absolutely continuous distribution, in fact it is not 
difficult to prove that the mapping 



$ w : i4 PrfllWlloo TV < t) = Pr max |W(ti)| < t 

\\u\<N 
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is uniformly continuous and strictly increasing on [0, oo) with continuous inverse 
■J^ 1 . Conclude that, as in the proof of Theorem 1, the norm balls {h £ £^ : 
\\h\\oo,N < t} t >o form a J\f o -uniformity class of subsets of i™. Since, by 
linearity, (II„ o r^" (n) ) o J 7-1 = (II„ o J 7-1 ) o 9^ where 6*^ re : £^ — > is given 
by <? 1 — ^ y/n(g — <fr n ), we obtain 

Mo J- 1 (h : ||ft||oo,iv < M n ) - (n„ o f- 1 ) o Q~l (h : Wh]]^ < M n ) ^ P ?° 
and thus M n — > p ?o $J"/(1 — a) as n — > oo. Now 

P%{Tf Q £ C n ) = P% (||/o - UU.n < Mn/Vn) 
= PP (||T^||oo,JV < M p 

= NoF-^h: WH^n < $^(1 - a)) + o(l) 
= $ w ($^ v 1 (l-a))+o(l) = l-a + o(l) 

which completes the proof of the first part. The second follows as in the proof 
of Theorem 1, using (7) and the continuous mapping theorem for J-(-). □ 

Theorem 3. Since /o £ L 1 n ff| we see by Fourier inversion on the circle, the 
Cauchy-Schwarz inequality and our assumption on the equivalent Sobolcv norm 
that 

||/*/olU < ^|/HI(i + IH)- s (i + H) s l/oHI 

m 

< fel/Mftl + M)-^ (j2^(m)\ 2 (l + \m\)A 



< c'\\f\\ 



-s,2, 



in particular / * /o, for / £ i^^jb £ H%, defines a continuous function on 
[0, 1) (by Fourier inversion), and the mapping A : / i— > 2/ * fo is linear and 
continuous from H^ 3 to C([0,1)) (this argument is taken from Theorem 1 in 
[30]). By Definition 1 and the continuous mapping theorem we thus have 

I3((U„ o T - C i,) o A-^oA- 1 ) -^ p f« 

as n — > oo, where j3 is the bounded Lipschitz metric for weak convergence in 
C([0,1)). Moreover from Corollary 6.8.5 in [5] we deduce as in the proof of 
Theorem 1 that norm balls {/ : j|/||oo < t}o<t<ca are AfoX^ 1 uniformity classes 
for weak convergence, and that the mapping 

$A -.t^MoX-^f : ||/||oo<*) 

from [0, oo) to [0, 1) is continuous and increasing. In fact, it is strictly increasing, 
using the results on p. 213-214 in [37] combined with the fact that the RKHS of 
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* Jo j equal to L ■ * /o, contains functions of arbitrary supremum norm. Denote 

A 



by (IK 1 the continuous inverse of 4>>. Conclude, as in the previous proofs, that 



JV° A" x (/ : \\fWn < M n ) - (n„ o A" 1 ) o 6-?„ Ufo (f : ||/||oo < M„) -> P "o 

as n — » oo, where now #x(»)*/ : 5 ^ V™(d~ ^™'*/o) maps C([0, 1)) — ► C([0, 1)). 

Thus, using the hypotheses on / n and the posterior contraction rate, the 
decomposition 

f*f-g*g= 2(/ -g)*g + {f ' -g)* if - g) 

and the convolution inequality \\h * /i'||oo < H^I^H^'IU we see, 
1 - a = n n o K~ l (g : ||g - /„ * /„||oo < M n /Vn) 



= n n (/ 

< n„(/ 

< n n (/ 



11/ */~/n* /nlloo <M n /v^) 
2||(/-XW)*/o||oo < M„/V^ + r„) + o P (l) 
2-s/^IK/ - X< n >) * /o||oo < M„ + <5„) + op(1), 



with <5„ = r n \fn = o(l) as n — > oo by assumption. Using the weak convergence 
property established above, 

l-a< $ A (M n + <5 n )+o P (l). 
Similarly, one obtains the inequality in the other direction 

l-a> $x(M n -5 n ) + o P (l). 
From this we conclude M n — > F fo — a) as n — > oo. Now as above 

p; (/o*/ ec„) = p; (ii /o * /o -/„*/„ ii cx><m„/v^) 

= P£(2||(/ n - /o) * /olloo < M n /V^ + o(l) 

= P; o (2V^||(X(")-/o)* /olloo <^ 1 (l-a))+o(l) 

= p;; i (2||w*/ || oo <^ 1 (i- a )) + o(i) 

= $A($A 1 (l-a)) + o(l) = l-a + o(l) 

completing the proof. □ 

Theorem J±. The proof is similar to, in fact simpler than, the previous ones, 
using the continuous mapping theorem to deduce 

/3 R ((II n o r- ( !,,)o r^Afor 1 ) -> p /o o 

as n — > oo, and that the intervals {[— t, t]}t>o are Afo L _ ^uniformity classes for 
weak convergence. We leave the details to the reader. □ 
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Theorem 5. The following notation is used in the proof 

C = *(/ ) + (* /o ,-^L) and *,(•)= JV(0j* A ||l)((-«v]). 
By definition of the quantile fx n it holds 



Q 
2 



n„ (*(/) - *(/ ) < Mn - *(/o)) 



< Mr, -61- 



*(/)-*(/0)-<*/ o ,/-/0> 



The assumed contraction of the posterior in a L 2 -neighborhood of fo at rate r n 
together with (17) and the fact that \fnr n = o(l) imply the existence of 5 n 
such that 

f < n n (^f a J - < VTi^ n - 61) + 5 n ) + P (1) 



a 



> n„(v^(*/„, / - xW) < - - <5n) + Op(1). 



Using the continuous mapping theorem and Definition 1, 



► F /o 



as n — > 00. Note that Af o has distribution function Since the 

sets {(— 00, t], t 6 1} form a uniformity class for weak convergence towards a 
normal distribution, we obtain 

- < $*(VHGUn - O'n) + S n ) + Op(1) 

> $*(Vn(Mn - 0£) - S n ) + o P (l). 



Q 



From this we deduce the following expansion for /x* , as n — > 00, 

Mn = ^ + -p*" 1 ^) + P {l/y/n). 
y/n 2 

The quantile v n expands similarly, with Q* 1 ^) replaced by 
by definition of 0*, 



Now 



P? (nfo)e(Hn,Vn]) = 

K\0(/2) 



Op 



J_\ a/2) 
Jri . 



o P 



= P% (<*,„, W) e [^-\a/2),^-\l - a/2)}) + o(l) = 1 - a + o(l), 
completing the proof. □ 
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4-2. Some Weak Convergence Facts 

Let (i, v be Borcl probability measures on a separable metric space (S,d). We 
call a family hi of measurable real- valued functions denned on S a [i— uniformity 
class for weak convergence if for any sequence fx n of Borel probability measures 
on S that converges weakly to /i we also have 



sup 

UEht 



u(s)(dfi n - d/j,)(s) 



(24) 



as n — > oo. Necessary and sufficient conditions for classes hi of functions or sets 
{1a ■ A e .4} to form uniformity classes are given in Billingsley and Tops0e [4]. 
For any subset A of S, define A s = {x 6 S : d(x, A) < 5} and the 5-boundary 
of A by dgA = {x G S : d(x,A) < S,d(x,A c ) < 5}. A family A of measurable 
subsets of S is a /x-uniformity class if and only if 

lim sup fj,(d s A) = 0, (25) 
8->o AeA 

see Theorem 2 in [4] . For classes of functions a similar characterisation is avail- 
able using moduli of continuity of the involved functions, see Theorem 1 in [4]. 
In particular the bounded Lipschitz metric 



v) = sup 

u£BL(l) 



u(s)(dfi — dv)(s) 



s 



tests against the class 

BL(l) = \f:S^R, sup 1/001+ sup l/( *J ~" f(t)l < 1 1 , 
sgs sj±t,s,tes a{s,t) J 

which is a uniformity class for any probability measure \x. It is well known that 
P metrises weak convergence of probability measures on any separable metric 
space (e.g., [11], Theorem 11.3.3). 

We conclude with the following observation, which was used repeatedly in 
our proofs: Let V(S) denote the space of Borel probability measures on 3, let 
(Q,A,F) be a probability space, let p. n : (Q,,A,¥) -> V(S),n e N, be random 
probability measures on S, and let ^ be a fixed probability measure on S. If 
f3(fi n , (i) — > F as n — > oo, and if U is a /i-uniformity class for n, then the conver- 
gence in (24) holds true in P-probability, as is easily proved by contradiction and 
passing to almost surely convergent subsequences. Likewise, if (T, d') is a metric 
space and F : S — > T is a continuous mapping, then /3(/i„ o /jo _F _1 ) — > p 0. 



References 

[1] A. Araujo and E. Gine. The central limit theorem for real and Banach val- 
ued random variables. John Wiley & Sons, New York-Chichestcr-Brisbane, 
1980. 



/Nonparametric BvM's 



30 



[2] P. J. Bickel and B. J. K. Kleijn. The semiparametric Bernstein-von Miscs 

theorem. Ann. Statist, 40:206-237, 2012. 
[3] P. J. Bickel and Y. Ritov. Nonparametric estimators which can be "plugged- 

in". Ann. Statist., 31(4):1033-1053, 2003. 
[4] P. Billingsley and F. Tops0e. Uniformity in weak convergence. Z. 

Wahrscheinlichkeitstheorie und Verw. Gebiete, 7:1-16, 1967. 
[5] V. I. Bogachcv. Gaussian measures, volume 62 of Mathematical Surveys 

and Monographs. American Mathematical Society, Providence, RI, 1998. 
[6] D. Bontemps. Bernstein-von mises theorems for gaussian regression with 

increasing number of rcgrcssors. Ann. Statist., 39:2557-2584, 2011. 
[7] S. Boucheron and E. Gassiat. A Bernstein-von Mises theorem for discrete 

probability distributions. Electron. J. Stat., 3:114-148, 2009. 
[8] I. Castillo. A semiparametric Bernstein-von Mises theorem for Gaussian 

process priors. Probab. Theory Related Fields, 152:53-99, 2012. 
[9] B. Clarke and S. Ghosal. Reference priors for exponential families with 
increasing dimension. Electron. J. Stat., 4:737-780, 2010. 

[10] A. Cohen, I. Daubechies, and P. Vial. Wavelets on the interval and fast 
wavelet transforms. Appl. Comput. Harmon. Anal., 1(1):54-81, 1993. 

[11] R. M. Dudley Real analysis and probability, volume 74 of Cambridge Stud- 
ies in Advanced Mathematics. Cambridge University Press, Cambridge, 
2002. Revised reprint of the 1989 original. 

[12] A. Feuerverger and R. A. Murcika. The empirical characteristic function 
and its applications. Ann. Statist., 5(l):88-97, 1977. 

[13] G. B. Folland. Real analysis. Pure and Applied Mathematics (New York). 
John Wiley & Sons Inc., New York, second edition, 1999. Modern tech- 
niques and their applications, A Wiley-Interscience Publication. 

[14] E. W. Frees. Estimating densities of functions of observations. J. Amer. 
Statist. Assoc., 89(426):517-525, 1994. 

[15] S. Ghosal. Asymptotic normality of posterior distributions in high- 
dimensional linear models. Bernoulli, 5(2):315-331, 1999. 

[16] S. Ghosal. Asymptotic normality of posterior distributions for exponential 
families when the number of parameters tends to infinity. J. Multivariate 
Anal, 74(l):49-68, 2000. 

[17] S. Ghosal, J. K. Ghosh, and AW. W. van der Vaart. Convergence rates of 
posterior distributions. Ann. Statist., 28(2):500-531, 2000. 

[18] S. Ghosal and A.W. van der Vaart. Convergence rates of posterior distri- 
butions for non-i.i.d. observations. Ann. Statist., 35(l):192-223, 2007. 

[19] E. Gine. Invariant tests for uniformity on compact Riemannian manifolds 
based on Sobolev norms. Ann. Statist., 3(6):1243-1266, 1975. 

[20] E. Gine and D. M. Mason. On local [/-statistic processes and the esti- 
mation of densities of functions of several sample variables. Ann. Statist., 
35(3):1105-1145, 2007. 

[21] E. Gine and R. Nickl. Uniform central limit theorems for kernel density 
estimators. Probab. Theory Related Fields, 141(3-4):333-387, 2008. 

[22] E. Gine and R. Nickl. Uniform limit theorems for wavelet density estima- 
tors. Ann. Probab., 37(4):1605-1646, 2009. 



/Nonparametric BvM's 



31 



[23] E. Gine and R. Nickl. Rates of contraction for posterior distributions in 

r-metrics, 1 < r < oo. Ann. Statist, 39:2883-2911, 2011. 
[24] P. E. Jupp. Data-driven Sobolev tests of uniformity on compact Rieman- 

nian manifolds. Ann. Statist, 36(3):1246-1260, 2008. 
[25] J. Kiefer and J. Wolfowitz. Asymptotically minimax estimation of concave 

and convex distribution functions. Z. Wahrscheinlichkeitstheorie und Verw. 

Gebiete, 34(l):73-85, 1976. 
[26] P.S. Laplace. Memoire sur lcs formules qui sont fonctions de tres grands 

nombres et sur leurs applications aux probabilitcs. Oeuvres de Laplace, 

12:301-345, 357-412, 1810. 
[27] L. Le Cam. Asymptotic methods in statistical decision theory. Springer 

Series in Statistics. Springer- Verlag, New York, 1986. 
[28] H. Leahu. On the Bernstein-von Mises phenomenon in the Gaussian white 

noise model. Electron. J. Stat, 5:373-404, 2011. 
[29] R. Nickl. Donsker-type theorems for nonparametric maximum likelihood 

estimators. Probab. Theory Related Fields, 138(3-4):411-449, 2007. correc- 
tion ibid., 141, 331-332. 
[30] R. Nickl. On convergence and convolutions of random signed measures. J. 

Theoret. Probab., 22(l):38-56, 2009. 
[31] V. Rivoirard and J. Rousseau. Bernstein-von Mises theorems for linear 

functionals of the density, preprint, 2011. 
[32] A. Schick and W. Wcfelmcyer. Root-n consistent density estimators for 

sums of independent random variables. J. Nonparametr. Stat, 16(6):925- 

935, 2004. 

[33] A. Schick and W. Wefelmeyer. Root-n consistent density estimators of con- 
volutions in weighted Li-norms. J. Statist. Plann. Inference, 137(6):1765- 
1774, 2007. 

[34] X. Shen and L. Wasserman. Rates of convergence of posterior distributions. 

Ann. Statist, 29(3):687-714, 2001. 
[35] A. W. van dcr Vaart. Asymptotic statistics, volume 3 of Cambridge Series 

in Statistical and Probabilistic Mathematics. Cambridge University Press, 

Cambridge, 1998. 

[36] A. W. van der Vaart and J. H. van Zanten. Rates of contraction of posterior 
distributions based on Gaussian process priors. Ann. Statist., 36(3):1435- 
1463, 2008. 

[37] A. W. van der Vaart and J. H. van Zanten. Reproducing kernel Hilbert 
spaces of Gaussian priors. In Pushing the limits of contemporary statistics: 
contributions in honor of Jayanta K. Ghosh, volume 3 of Inst. Math. Stat. 
Collect, pages 200-222. Inst. Math. Statist., Beachwood, OH, 2008. 

[38] A. W. van dcr Vaart and J. H. van Zanten. Adaptive Bayesian estima- 
tion using a Gaussian random field with inverse gamma bandwidth. Ann. 
Statist, 37(5B):2655-2675, 2009. 

[39] A. W. van der Vaart and J. A. Wcllncr. Weak convergence and empirical 
processes. Springer Series in Statistics. Springer- Verlag, New York, 1996. 

[40] R. von Mises. Wahrscheinlichkeitsrechnung. Deuticke, Leipzig- Vienna, 
1931. 



